Goto

Collaborating Authors

 low-rank tensor regression




Near Optimal Sketching of Low-Rank Tensor Regression

Li, Xingguo, Haupt, Jarvis, Woodruff, David

Neural Information Processing Systems

This problem is motivated by the fact that the number of parameters in $\Theta$ is only $R \cdot \sum_{d 1} D p_D$, which is significantly smaller than the $\prod_{d 1} {D} p_d$ number of parameters in ordinary least squares regression. We consider the above CP decomposition model of tensors $\Theta$, as well as the Tucker decomposition. We obtain a significantly smaller dimension and sparsity in the randomized linear mapping $\Phi$ than is possible for ordinary least squares regression. Finally, we give a number of numerical simulations supporting our theory. Papers published at the Neural Information Processing Systems Conference.


ISLET: Fast and Optimal Low-rank Tensor Regression via Importance Sketching

Zhang, Anru, Luo, Yuetian, Raskutti, Garvesh, Yuan, Ming

arXiv.org Machine Learning

In this paper, we develop a novel procedure for low-rank tensor regression, namely \underline{I}mportance \underline{S}ketching \underline{L}ow-rank \underline{E}stimation for \underline{T}ensors (ISLET). The central idea behind ISLET is \emph{importance sketching}, i.e., carefully designed sketches based on both the responses and low-dimensional structure of the parameter of interest. We show that the proposed method is sharply minimax optimal in terms of the mean-squared error under low-rank Tucker assumptions and under randomized Gaussian ensemble design. In addition, if a tensor is low-rank with group sparsity, our procedure also achieves minimax optimality. Further, we show through numerical studies that ISLET achieves comparable or better mean-squared error performance to existing state-of-the-art methods whilst having substantial storage and run-time advantages including capabilities for parallel and distributed computing. In particular, our procedure performs reliable estimation with tensors of dimension $p = O(10^8)$ and is $1$ or $2$ orders of magnitude faster than baseline methods.


Near Optimal Sketching of Low-Rank Tensor Regression

Li, Xingguo, Haupt, Jarvis, Woodruff, David

Neural Information Processing Systems

We study the least squares regression problem $\min_{\Theta \in \RR^{p_1 \times \cdots \times p_D}} \| \cA(\Theta) - b \|_2^2$, where $\Theta$ is a low-rank tensor, defined as $\Theta = \sum_{r=1}^{R} \theta_1^{(r)} \circ \cdots \circ \theta_D^{(r)}$, for vectors $\theta_d^{(r)} \in \mathbb{R}^{p_d}$ for all $r \in [R]$ and $d \in [D]$. %$R$ is small compared with $p_1,\ldots,p_D$, Here, $\circ$ denotes the outer product of vectors, and $\cA(\Theta)$ is a linear function on $\Theta$. This problem is motivated by the fact that the number of parameters in $\Theta$ is only $R \cdot \sum_{d=1}^D p_D$, which is significantly smaller than the $\prod_{d=1}^{D} p_d$ number of parameters in ordinary least squares regression. We consider the above CP decomposition model of tensors $\Theta$, as well as the Tucker decomposition. For both models we show how to apply data dimensionality reduction techniques based on {\it sparse} random projections $\Phi \in \RR^{m \times n}$, with $m \ll n$, to reduce the problem to a much smaller problem $\min_{\Theta} \|\Phi \cA(\Theta) - \Phi b\|_2^2$, for which $\|\Phi \cA(\Theta) - \Phi b\|_2^2 = (1 \pm \varepsilon) \| \cA(\Theta) - b \|_2^2$ holds simultaneously for all $\Theta$. We obtain a significantly smaller dimension and sparsity in the randomized linear mapping $\Phi$ than is possible for ordinary least squares regression. Finally, we give a number of numerical simulations supporting our theory.


Tensor Regression Meets Gaussian Processes

Yu, Rose, Li, Guangyu, Liu, Yan

arXiv.org Machine Learning

Low-rank tensor regression, a new model class that learns high-order correlation from data, has recently received considerable attention. At the same time, Gaussian processes (GP) are well-studied machine learning models for structure learning. In this paper, we demonstrate interesting connections between the two, especially for multi-way data analysis. We show that low-rank tensor regression is essentially learning a multi-linear kernel in Gaussian processes, and the low-rank assumption translates to the constrained Bayesian inference problem. We prove the oracle inequality and derive the average case learning curve for the equivalent GP model. Our finding implies that low-rank tensor regression, though empirically successful, is highly dependent on the eigenvalues of covariance functions as well as variable correlations.